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1 Parallel and object oriented programming: integrating applications with cache and 

memory..maniagemeM 
Harjinder S. Sandhu 

November 1992 Proceedings of the 1992 conference of the Centre for Advanced 

Studies on Collaborative research - Volume 1 
Publisher: IBM Press 

Full text available: ^pdf(823.pi K8) Additional Information: MJlltatjon, abstract, references, citings 

In shared memory multiprocessors with Nonuniform Memory Access (NUMA) 
characteristics, effective cacheing and memory locality are essential to performance. In 
this paper, we argue for a new approach for cache and NUMA memory management 
based upon the integration of application-sharing characteristics with system runtime 
management of shared data. An application's shared data is subdivided into shared 
regions of memory, and the application defines explicitly the operations on those 
regions ... 

^ Ihe.shared.regions.app.r^^^^ 

Harjinder S. Sandhu, Benjamin Gamsa, Songnian Zhou 
^ July 1993 ACM SIGPLAN Notices , Proceedings of the fourth ACM SIGPLAN 

symposium on Principles and practice of parallel programming PPOPP '93, 

Volume 28 Issue 7 
Publisher: ACM Press 

Additional Information: full citation, abstract, references, citings, index 
terms, review 



Full text available: Mi>dfC112.M3i 



The effective management of caches is critical to the performance of applications on 
shared-memory multiprocessors. In this paper, we discuss a technique for software cache 
coherence tht is based upon the Integration of a program-level abstraction for shared data 
with software cache management . The program-level abstraction, called Shared Regions, 
explicitly relates synchronization objects with the data they protect. Cache coherence 
algorithms are presented which use the I ... 



^ .Hjerarchicalcach^^^^ 
^ A. W. Wiison 

^ June 1987 Proceedings of the 14th annual international symposium on Computer 
architecture 
Publisher: ACM Press 
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terms 

A new, large scale multiprocessor architecture is presented in this paper. The architecture 
consists of hierarchies of shared buses and caches. Extended versions of shared bus 
multicache coherency protocols are used to maintain coherency among all caches in the 
system. After explaining the basic operation of the strict hierarchical approach, a 
clustered system is introduced which distributes the memory among groups of processors. 
Results of simulations are presented which demonstrate that t ... 

* Hardware fault containment in scalable shared-memory multiprocessors 

ban feodosiu, Joel Baxter, Kinshuk Govil, John Chapin, Mendei Rosenblum, Mark Horowitz 
^ May 1997 ACM SIGARCH Computer Architecture News , Proceedings of the 24th 

annual international symposium on Computer architecture ISCA '97, volume 

25 Issue 2 

Publisher: ACM Press 

Full text available* Mpd^^2 05 M3i Additional Information: Ml cMiQ.n, abstract, references, citings, index 

Current shared-memory multiprocessors are inherently vulnerable to faults: any 
significant hardware or system software fault causes the entire system to fail. Unless 
provisions are made to limit the impact of faults, users will perceive a decrease in 
reliability when they entrust their applications to larger machines. This paper shows that 
fault containment techniques can be effectively applied to scalable shared-memory 
multiprocessors to reduce the reliability problems created by increased mach ... 

^ Cache coherence for large scale shared memory multiprocessors 
^ M. Thapar, B. Delagi 

^ May 1990 Proceedings of the second annual ACM symposium on Parallel algorithms 
and architectures 
Publisher: ACM Press 

Full text available: ■^.pdf{M§:67K3Ji Additional Information: full citation, references, citings, index terms 



6 Performance evaluation of a commercial cache-coherent shared memory 
^ multiprocessor 

^ Rajeev Jog, Philip L. Vitale, James R. Callister 

April 1990 ACM SIGMETRICS Performance Evaluation Review , Proceedings of the 

1990 ACM SIGMETRICS conference on Measurement and modeling of 

computer systems SIGMETRICS '90, Volume is issue i 
Publisher: ACM Press 

Full text available: ^pdf/948.46 KB) Additional Information: M^ation. abstracL references, citirm index 

This paper describes an approximate Mean Value Analysis (MVA) model developed to 
project the performance of a small-scale shared-memory commercial symmetric 
multiprocessor system. The system, based on Hewlett Packard Precision Architecture 
processors, supports multiple active user processes and multiple execution threads within 
the operating system. Using detailed timing for hardware delays, a customized 
approximate closed queueing model is developed for the multiprocessor system ... 

7 ArcMecturaL^^^^^ 

^ multiprocessors 

Marcelo Cintra, Jos6 F. Martinez, Josep Torrellas 

May 2000 ACM SIGARCH Computer Architecture News , Proceedings of the 27th 

annual international symposium on Computer architecture ISCA '00, volume 

28 Issue 2 



http://portd.acm.org/resultsxfm?coll=ACM&dl=ACM&CFID=60484421&CFTOKEN=86... 12/20/05 



Results (page 1): invalidate* and (update* or flush* or purge*) and (cache line) and multi... Page 3 of 6 



Publisher: ACM Press 

Additional Information: full citation, abstract, references, citings. Index 
tecins 

Speculative parallelization aggressively executes In parallel codes that cannot be fully 
parallelized by the compiler. Past proposals of hardware schemes have mostly focused on 
single-chip multiprocessors (CMPs), whose effectiveness is necessarily limited by their 
small size. Very few schemes have attempted this technique in the context of scalable 
shared-memory systems. In this paper, we present and evaluate a new hardware scheme 
for scalable speculative parallelization. This de ... 



Full text available: ' 



Cache coherence for large scale shared memory multiprocessors 
Manu Thapar, Bruce Delagi 

March 1991 ACM SIGARCH Computer Architecture News, volume 19 issue i 
Publisher: ACM Press 

Full text available: ^ pdf{534.44 KB) Additional Information: full citation, index tenns 



9 Architectural primitives for a scalable shared memory multiprocessor Q 
^ Joonwon Lee, Umakishore Ramachandran 

^ June 1991 Proceedings of the third annual ACM symposium on Parallel algorithms 
and architectures 
Publisher: ACM Press 

Full text available: *P| pdfri.27 MB) Additional Information: full citalton, references, citings, index terms 



10 CRL: high-performance all-software distributed shared memory B 
K. L Johnson, M. F. Kaashoek, D. A. Wallach 

December 1995 ACM SIGOPS Operating Systems Review , Proceedings of the fifteenth 
ACM symposium on Operating systems principles SOSP '95, Volume 29 

Issue 5 

Publisher: ACM Press 

Full text available: ■Mi>ria2..Q2.MB) Additional Information: MLcitationj references, citlQfls, indexlerms 



The directory-based cache coherence protocol for the DASH multiprocessor | 
^ Daniel Lenoski, James Laudon, Kourosh Gharachorloo, Anoop Gupta, John Hennessy 
^ May 1990 ACM SIGARCH Computer Architecture News , Proceedings of the 17th 

annual international symposium on Computer Architecture ISCA '90, volume 

18 Issue 3a 

Publisher: ACM Press 

Full text available- MpdfM 7A M31 Additional Information: .M.citation, abstract, references, citinas, index 
. X terms 

DASH is a scalable shared-memory multiprocessor currently being developed at Stanford's 
Computer Systems Laboratory. The architecture consists of powerful processing nodes, 
each with a portion of the shared-memory, connected to a scalable interconnection 
network. A key feature of DASH is its distributed directory-based cache coherence 
protocol. Unlike traditional snoopy coherence protocols, the DASH protocol does not rely 
on broadcast; instead it uses point-to-point messages sent between th ... 

!yiuJt!processor.c^^^^ | 
p. Bitar, A. M. Despain 

^ June 1986 ACM SIGARCH Computer Architecture News , Proceedings of the 13th 

annual international symposium on Computer architecture ISCA '86, volume 
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14 Issue 2 

Publisher: IEEE Computer Society Press, ACM Press 

Full text available- Wi Ddf/981 60 KRl Additional Information: MLoitation, abstract, .references, dtinas, Mm 

Many options are possible in a cache synchronization (or consistency) scheme for a 
broadcast system. We clarify basic concepts, analyze the handling of shared data, and 
then describe a protocol that we are currently exploring. Finally, we analyze the evolution 
of options that have been proposed under write-in (or write-back) policy. We show how 
our protocol extends this evolution with new methods for efficient busy-wait locking, 
waiting, and unlocking. The ... 

13 Revive: cost-effectiv e architectural support for rollback recoyery in shared-memory Q 

^ multiprQcessors 

Milos Prvuiovic, Zheng Zhang, Josep Torrellas 

May 2002 ACM SIGARCH Computer Architecture News , Proceedings of the 29th 
annual international symposium on Computer architecture ISCA '02 , 
Proceedings of the 29th annual international symposium on Computer 
architecture ISCA '02, volume 30 issue 2 

Publisher: IEEE Computer Society, ACM Press 

Full text available: igp^f/^ 33 jyigi^ Additional Information: full citation, eibstract. references, cittngs . index 
Publlshersite 

This paper presents Revive, a novel general-purpose rollback recovery mechanism for 
shared-memory multiprocessors. ReVive carefully balances the conflicting requirements of 
availability, performance, and hardware cost. ReVive performs checkpointing, logging, 
and distributed parity protection, all memory-based. It enables recovery from a wide class 
of errors, including the permanent loss of an entire node. To maintain high performance, 
Revive includes specialized hardware that performs frequent 0 ... 

Keywords: fault tolerance, shared-memory multiprocessors, rollback recovery, recovery, 
BER, logging, parity, checkpointing, availability 

14 Compiler and hardware support for cache coherence in large-scale multiprocessors: Q 
^ desion considerations and performance study 

Lynn Choi, Pen-Chung Yew 

May 1996 ACM SIGARCH Computer Architecture News , Proceedings of the 23rd 

annual international symposium on Computer architecture ISCA '96, volume 

24 Issue 2 

Publisher: ACM Press 

Full text available: ^ pdf(1.48MB) Additional Information: tuH citation, abstract, references, citings, index 

In this paper, we study a hardware-supported, compiler directed (HSCD) cache coherence 
scheme, which can be implemented on a large-scale multiprocessor using off-the-shelf 
microprocessors, such as the Cray T3D. It can be adapted to various cache organizations, 
including multi-word cache lines and byte-addressable architectures. Several system 
related issues, including critical sections, inter-thread communication, and task migration 
have also been addressed. The cost of the required hardware sup ... 

^5 Shar.ed„mem^^ Q 
area networks 

^ Leonidas Kontothanassis, Robert Stets, Galen Hunt, Umit Rencuzogullari, Gautam Altekar, 
Sandhya Dwarkadas, Michael L. Scott 

August 2005 ACM Transactions on Computer Systems (TOCS), volume 23 issue 3 
Publisher: ACM Press 
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Full text available: ^pdf(918,28 KB) Additional Information: M.citatiQn, abstract fefeLrences, MexJerrns 

Cashmere is a software distributed shared memory (S-DSM) system designed for clusters 
of server-class machines. It is distinguished from most other S-DSM projects by (1) the 
effective use of fast user-level messaging, as provided by modern system-area networks, 
and (2) a ''two-level" protocol structure that exploits hardware coherence within 
multiprocessor nodes. Fast user-level messages change the tradeoffs in coherence 
protocol design; they allow Cashmere to employ a relatively simp ... 

Keywords: Distributed shared memory, relaxed consistency, software coherence 
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Donald Yeung, John Kubiatowicz, Anant Agarwal 

May 2000 ACM Transactions on Computer Systems (TOCS), volume is issue 2 
Publisher: ACM Press 
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Full text available: t qDdt??>69 18 K3^ 

^ review 

Parallel workstations, each comprising tens of processors based on shared memory, 
promise cost-effective scalable multiprocessing. This article explores the coupling of such 
small- to medium-scale shared-memory multiprocessors through software over a local 
area network to synthesize larger shared-memory systems. We call these systems 
Distributed Shared-memory Multiprocessors (DSMPs). This article introduces the design of 
a shared-memory system that uses multiple granularities of sharing, ca ... 

Keywords: distributed memory, symmetric multiprocessors, system of systems 



^ ^ MGS: a multigrain shared mennory system 
Donald Yeung, John Kubiatowicz, Anant Agarwal 

May 1996 ACM SIGARCH Computer Architecture News , Proceedings of the 23rd 

annual international symposium on Computer architecture ISCA '96, volume 

24 Issue 2 

Publisher: ACM Press 

Full text available- M od^M 37 M3) Additional Information: full citation, abstract, references, ciiings, index 

' M-^-"^^ - ^ terms 

Parallel workstations, each comprising 10-100 processors, promise cost-effective general- 
purpose multiprocessing. This paper explores the coupling of such small- to medium-scale 
shared memory multiprocessors through software over a local area network to synthesize 
larger shared memory systems. We call these systems Distributed Scalable Shared- 
memory Multiprocessors (DSSMPs).This paper Introduces the design of a shared memory 
system that uses multiple granularities of sharing, and presents an imp ... 

18 COMA: an opportunity for buiidiriQ fault-tolerant scalable shared memory 
multiprocessors 

Christine Morin, Alain Gefflaut, Michel Banatre, Anne-Marie Kermarrec 
May 1996 ACM SIGARCH Computer Architecture News , Proceedings of the 23rd 

annual international symposium on Computer architecture ISCA '96, volume 

24 Issue 2 

Publisher: ACM Press 

Full text available: flMasOMB) Additional Information: full citatiop, abst^cL references, dtinoL MM 

Due to the increasing number of their components, Scalable Shared Memory 
Multiprocessors (SSMMs) have a very high probability of experiencing failures. Tolerating 
node failures therefore becomes very important for these architectures particularly if they 
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must be used for long-running computations. In this paper, we show that the class of 
Cache Only Memory Architectures (COMA) are good candidates for building fault-tolerant 
SSMMs. A backward error recovery strategy can be Implemented without sign ... 

Keywords: Scalable Shared 

Performance.M.da^^^^^ 
Piocessprs 

Parthasarathy Ranganathan, Kourosh Gharachorloo, Sarita V. Adve, Luiz Andr6 Barroso 
October 1998 ACM SIGPLAN Notices , ACM SIGOPS Operating Systems Review , 

Proceedings of the eighth international conference on Architectural 
support for programming languages and operating systems ASPLOS- 

VIII, Volume 33 , 32 Issue 11,5 

Publisher: ACM Press 

Full text available: WipdiOm MB) Additional Information: full citation. jibstQicL references, ciliim. index 

Database applications such as online transaction processing (OLTP) and decision support 
systems (DSS) constitute the largest and fastest-growing segment of the market for 
multiprocessor servers. However, most current system designs have been optimized to 
perform well on scientific and engineering worl<loads. Given the radically different 
behavior of database workloads (especially OLTP), it is important to re-evaluate key 
system design decisions in the context of this Important class of applicatio ... 

Becgvery.protocols^^^ ^ 
]^ Lory D, Molesky, Krithi Ramamritham 

^ May 1995 ACM SIGMOD Record , Proceedings of the 1995 ACM SIGMOD international 
conference on Management of data SIGMOD '95, volume 24 issue 2 
Publisher: ACM Press 

Full text available: ^.pdf(165 MB) Additional Information: .M.cltatlon, abstract, references, jrhdex.terrns 

Significant performance advantages can be gained by implementing a database system on 
a cache-coherent shared memory multiprocessor. However, problems arise when failures 
occur. A single node (where a node refers to a processor/memory pair) crash may require 
a reboot of the entire shared memory system. Fortunately, shared memory 
multiprocessors that isolate individual node failures are currently being developed. Even 
with these, because of the side effects of the cache coherency protocol, ... 
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